Classification of Document Page Images
نویسندگان
چکیده
Searching in a large heterogeneous collection of scanned document images often produces uncertain results in part because of the size of the collection and the lack of an ability to focus queries appropriately. Searching for documents by their type is a natural way to enhance the effectiveness of document retrieval in the workplace [2], and a such system is proposed in [4]. The goal of our work is to build classifiers that can determine the type or genre of a document image. We use primarily layout features since the layout of a document contains a significant amount of information that can be used to identify a document’s type. Layout analysis is necessary since our input image has no structural definition that is immediately perceivable by a computer. Classification is thus based on “visual similarity” of the structure without reference to models of particular kinds of pages. There has been some classification work reported but most require either domain specific models [3, 5, 6, 8] or are based on text obtained by optical character recognition (OCR) [3, 6, 8].
منابع مشابه
Document Analysis And Classification Based On Passing Window
In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...
متن کاملPersian Printed Document Analysis and Page Segmentation
This paper presents, a hybrid method, low-resolution and high-resolution, for Persian page segmentation. In the low-resolution page segmentation, a pyramidal image structure is constructed for multiscale analysis and segments document image to a set of regions. By high-resolution page segmentation, by connected components analysis, each region is segmented to homogeneous regions and identifyi...
متن کاملLearning Document Image Features With SqueezeNet Convolutional Neural Network
The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...
متن کاملIssues in Constructing Document Thumbnails for Page-Image Digital Libraries
Digital libraries are increasingly based on digital page images of documents, but techniques for constructing usable versions of these page images are still largely folklore. This paper documents some issues encountered in creating document icons, page thumbnails, and page images for the UpLib digital library system, and suggests answers for each of them, based on both problem analysis and user...
متن کاملDocument Icons and Page Thumbnails: Issues in Construction of Document Thumbnails for Page-Image Digital Libraries
Digital libraries are increasingly based on digital page images, but techniques for constructing usable versions of these page images are largely folklore. This paper documents some issues encountered in creating various kinds of renderings of page images for the UpLib digital library system, and suggests approaches for each, based on both problem analysis and user feedback. Several factors imp...
متن کاملPage Layout Classification Technique for Biomedical Documents
The structural layout information of scanned document pages is valuable for a wide range of document processing applications such as automatic document searching, document delivery and automated data entry. This paper describes the classification of scanned document pages into different classes of physical layout structures. The page layout classification technique proposed in this paper uses a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999